Modeling, Evaluating, and Improving the Performance of Supercomputer Scheduling

نویسندگان

  • Dan Tsafrir
  • Dror G. Feitelson
چکیده

The most popular scheduling policy for parallel systems is FCFS with backfilling (a.k.a. “EASY” scheduling), where short jobs are allowed to run ahead of their time provided they do not delay previously queued jobs (or at least the first queued job). This mandates users to provide estimates of how long jobs will run, and jobs that violate these estimates are killed so as not to violate subsequent commitments. The de-facto standard of evaluating the impact of inaccurate estimates on performance has been to use a “badness factor” f ≥ 0, such that given a runtime r, the associated estimate is uniformly distributed in [r, r · (f + 1)], or is simply r · (f + 1). The underlying assumption was that bigger fs imply worse information. Surprisingly, inaccurate estimates (f > 0) yield better performance than accurate ones (f = 0), a fact that has repeatedly produced statements like “inaccurate estimates actually improve performance” or “what the scheduler doesn’t know won’t hurt it”, in many independent studies. This has promoted the perception that estimates are “unimportant”. At the same time, other studies noted that real user estimates are inaccurate, and that system-generated predictions based on history can do better. But predictions were never incorporated into production schedulers, partially due the aforementioned perception that inaccuracy actually helps, partially because suggested predictors were too complex, and partially because underprediction is technically unacceptable, as users will not tolerate jobs being killed just because system predictions were too short. All attempts to solve the latter technicality yielded algorithms that are inappropriate for many supercomputing settings (e.g. using preemption, assuming all jobs are restartable, etcetera). This work has four major contributions. First, we show that the “inaccuracy helps” common wisdom is merely an unwarranted artifact of the erroneous manner in which inaccurate estimates have been modeled, and that increased accuracy actually improves performance. Specifically, previously observed improvements turn out to be due to a “heel and toe” dynamics that, with f > 0, cause backfilling to approximate shortest-job first scheduling. We show that multiplying estimates by a factor translates to trading off fairness for performance, and that this reasoning works regardless of whether the values being multiplied are actual runtimes (“perfect estimates”) or the flawed estimates that are supplied by users. We further show that the more accurate the values we multiply, the better the resulting performance. Thus, better estimates actually improve performance, and multiplying is in fact a scheduling policy that exercises the fairness/performance tradeoff. Regardless, multiplying is anything but representative of real inaccuracy, as outlined next. Our second contribution is developing a more representative model of estimates that, from now on, will allow for a valid evaluation of the effect of inaccurate estimates. It is largely based on noting that human users repeatedly use the same “round” values (ten minutes, one hour etc.) and on the invariant that 90% of the jobs use the same 20 estimates. Importantly, the most popular estimate is typically the maximal allowed. As a result, the jobs associated with this estimate cannot be backfilled, and indeed, the more this value is used, the more EASY resembles plain FCFS. Thus, to artificially increase the inaccuracy one should e.g. associate more jobs with the maximum (a realistic manipulation), not multiply by a greater factor (a bogus boost of performance). Our third contribution exploits the above understandings to devise a new scheduler that is able to automatically improve the quality of estimates and put this into productive use in the context of EASY, while preserving its attractive simple batch essence and refraining from any unacceptable assumptions. Specifically, the problem of underprediction is solved by divorcing kill-time from

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Modeling and scheduling no-idle hybrid flow shop problems

Although several papers have studied no-idle scheduling problems, they all focus on flow shops, assuming one processor at each working stage. But, companies commonly extend to hybrid flow shops by duplicating machines in parallel in stages. This paper considers the problem of scheduling no-idle hybrid flow shops. A mixed integer linear programming model is first developed to mathematically form...

متن کامل

A modified branch and bound algorithm for a vague flow-shop scheduling problem

Uncertainty plays a significant role in modeling and optimization of real world systems. Among uncertain approaches, fuzziness describes impreciseness while for ambiguity another definition is required. Vagueness is a probabilistic model of uncertainty being helpful to include ambiguity into modeling different processes especially in industrial systems. In this paper, a vague set based on dista...

متن کامل

Scheduling Nurse Shifts Using Goal Programming Based on Nurse Preferences: A Case Study in an Emergency Department

Nowadays, nurses scheduling is one of the most important challenges with which health care centers are encountered. The significance of nurses’ work quality has led researchers to be concerned about scheduling problems, which have an impact on nurses’ performance. Observing the interests of hospital and patients, providing their satisfaction, and meeting their needs are among the main objective...

متن کامل

Incorporation of Demand Response Programs and Wind Turbines in Optimal Scheduling of Smart Distribution Networks: A Case Study

Smart distribution networks (SDNs) plays a significant role in future power networks. Accordingly, the optimal scheduling of such networks, which include planning of consumers and production sections, inconsiderably concerned in recent research studies. In this paper, the optimal planning of energy and reserve of SDNs has been studied. Technical constraints of distribution network and power gen...

متن کامل

Maximizing the nurses’ preferences in nurse scheduling problem: mathematical modeling and a meta-heuristic algorithm

The nurse scheduling problem (NSP) has received a great amount of attention in recent years. In the NSP, the goal is to assign shifts to the nurses in order to satisfy the hospital’s demand during the planning horizon by considering different objective functions. In this research, we focus on maximizing the nurses’ preferences for working shifts and weekends off by considering several important...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006